Lower Bound On the Computational Complexity of Discounted Markov Decision Problems

نویسندگان

  • Yichen Chen
  • Mengdi Wang
چکیده

We study the computational complexity of the infinite-horizon discounted-reward Markov Decision Problem (MDP) with a finite state space S and a finite action space A. We show that any randomized algorithm needs a running time at least Ω(|S||A|) to compute an -optimal policy with high probability. We consider two variants of the MDP where the input is given in specific data structures, including arrays of cumulative probabilities and binary trees of transition probabilities. For these cases, we show that the complexity lower bound reduces to Ω ( |S||A| ) . These results reveal a surprising observation that the computational complexity of the MDP depends on the data structure of input.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accelerated decomposition techniques for large discounted Markov decision processes

Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...

متن کامل

Near-optimal PAC bounds for discounted MDPs

We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finite-state discounted Markov Decision Processes (MDPs). We prove a new bound for a modified version of Upper Confidence Reinforcement Learning (UCRL) with only cubic dependence on the horizon. The bound is unimprovable in all parameters except the size of the state/action space, where it depends lin...

متن کامل

PAC Bounds for Discounted MDPs

We study upper and lower bounds on the sample-complexity of learning nearoptimal behaviour in finite-state discounted Markov Decision Processes (mdps). We prove a new bound for a modified version of Upper Confidence Reinforcement Learning (ucrl) with only cubic dependence on the horizon. The bound is unimprovable in all parameters except the size of the state/action space, where it depends line...

متن کامل

Optimistic Planning in Markov Decision Processes Using a Generative Model

We consider the problem of online planning in a Markov decision process with discounted rewards for any given initial state. We consider the PAC sample complexity problem of computing, with probability 1−δ, an �-optimal action using the smallest possible number of calls to the generative model (which provides reward and next-state samples). We design an algorithm, called StOP (for StochasticOpt...

متن کامل

A Single Machine Sequencing Problem with Idle Insert: Simulated Annealing and Branch-and-Bound Methods

  In this paper, a single machine sequencing problem is considered in order to find the sequence of jobs minimizing the sum of the maximum earliness and tardiness with idle times (n/1/I/ETmax). Due to the time complexity function, this sequencing problem belongs to a class of NP-hard ones. Thus, a special design of a simulated annealing (SA) method is applied to solve such a hard problem. To co...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1705.07312  شماره 

صفحات  -

تاریخ انتشار 2017